1 Data input

Let’s start with reading the computed metrics for all projects.

## [1] TRUE
## 'data.frame':    2835 obs. of  19 variables:
##  $ project         : Factor w/ 13 levels "black","cookiecutter",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ bug_number      : int  1 3 4 6 7 8 10 11 14 15 ...
##  $ granularity     : Factor w/ 3 levels "function","statement",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ technique       : Factor w/ 7 levels "DStar","Metallaxis",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ crashing        : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ predicate       : logi  FALSE FALSE FALSE TRUE TRUE TRUE ...
##  $ ismutable       : logi  FALSE FALSE FALSE TRUE TRUE TRUE ...
##  $ mutability      : num  0 0 0 0.112 0.119 ...
##  $ time            : num  132.4 104.2 68.5 58.8 64.8 ...
##  $ einspect        : num  4 100.5 39.5 11 29 ...
##  $ is_bug_localized: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ exam            : num  0.0099 0.3018 0.1282 0.0364 0.0967 ...
##  $ java_exam_score : num  0.0099 0.3018 0.1282 0.0364 0.0967 ...
##  $ cdist           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ svcomp          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ minutes         : num  2.21 1.74 1.14 0.98 1.08 ...
##  $ family          : Factor w/ 4 levels "MBFL","PS","ST",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ category        : Factor w/ 4 levels "CL","DEV","DS",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ bugid           : Factor w/ 135 levels "black1","black10",..: 1 9 10 11 12 13 2 3 4 5 ...

We have data about 135 bugs in rlength(unique(datas$project))` analyzed projects.

2 Pairwise comparisons

Let’s see an example of visual and statistical comparison of two groups of experiments for the same bugs.

To make the example concrete, let’s pick two groups and compare their \(E_{\text{inspect}}\) scores on statement-level fault localization:

  • \(S\) are the experiments done with any SBFL technique
  • \(M\) are the experiments done with any MBFL technique

Since there are three experiments per bug using SBFL, but only two experiments per bug using MBFL, we’ll aggregate scores for the same bug by average.

Let’s start with some visualization: a scatterplot with a point for each bug; each point has coordinates \(x, y\) where \(x\) is its score in MBFL and \(y\) its score in SBFL.

As you can see, there are a bulk of bugs for which SBFL performs very similarly to MBFL (points close to the \(x = y\) straight line). However, for several other bugs, SBFL is much better (remember that lower is better for this score).

Looking at the colors, we notice that several bugs in the CL (and possibly DS) category are overrepresented among the “harder” bugs on which SBFL behaves much better than MBFL.

Analyzing the same data numerically, we can compute the correlation (Kendall’s \(\tau\)) between \(S\) and \(M\):

## 
##  Kendall's rank correlation tau
## 
## data:  S and M
## z = 7.8047, p-value = 5.965e-15
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##       tau 
## 0.5403952

A correlation of 0.5403952 is not super strong, but clearly defined.

Finally, we may also perform a statistical test (Wilcoxon’s paired test) and compute a matching effect size (Cliff’s delta).

## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  S and M
## V = 1068, p-value = 0.005209
## alternative hypothesis: true location shift is not equal to 0
## 
## Cliff's Delta
## 
## delta estimate: -0.1761866 (small)
## 95 percent confidence interval:
##       lower       upper 
## -0.29548868 -0.05147369

Cliff’s delta, in particular, roughly measures how often the value in one set are larger than the value in the other set. Thus, the given value means that SBFL’s \(E_{\text{inspect}}\) score is smaller than MBFL’s roughly in 18% of the cases.

These statistics, for what they’re worth, seem to confirm that there is a noticeable difference in favor of SBFL.

Now, let’s generalize this to a scatterplot matrix to show the relations between all possible pairs of FL families.

First, we define a bunch of helper functions.

Then, we use them to generate plots for \(E\).

Now, it’s easy to compute a similar plot for other metrics. For example, running time (in minutes):

3 Regression models

Let’s build a simple multivariate regression model, where we predict einspect and time from bug and technique.

First, we standardize the predictors, so that it’s much easier to set sensible priors.

3.1 Model \(m_1\): baseline multivariate regression

Here’s a basic regression model, where the only unusual aspects are that it’s multivariate, and log-transforms the mean (since both outcome variables must be nonnegative).

eq.m1 <- brmsformula(
  mvbind(einspectS, timeS) ~ 0 + family + category,
  family=brmsfamily("gaussian", link="log")
) + set_rescor(TRUE)

pp1.check <- get_prior(eq.m1, data=by.statement)

pp1 <- c(
  set_prior("normal(0, 1.0)", class="b", resp=c("einspectS", "timeS")),
  set_prior("weibull(2, 1)", class="sigma", resp=c("einspectS", "timeS"))
)

3.2 Fitting \(m_1\)

Let’s do the usual checks to make sure that everything is fine with the fitting.

Prior checks, confirming that the sampled priors span a wide range of values, amply including the data.

Now we fit the actual model.

## Start sampling
## Running MCMC with 4 chains, at most 8 in parallel...
## 
## Chain 4 finished in 12.0 seconds.
## Chain 1 finished in 12.2 seconds.
## Chain 2 finished in 12.2 seconds.
## Chain 3 finished in 13.2 seconds.
## 
## All 4 chains finished successfully.
## Mean chain execution time: 12.4 seconds.
## Total execution time: 13.4 seconds.

Next, we check the usual diagnostics:

  • No (or at most a few) divergent transitions
  • \(\widehat{R}\) ratio below \(1.01\)
  • Effective sample size (ESS), as a ratio of the total sample size, at least 10%
## [1] 0
## [1] 1.004918
## [1] 0.3498535

Finally, we check the posteriors, to ensure that we have a decent approximation of the data.

As you can see, the simulated posteriors are acceptable given that the data is complex, whereas the model is quite simplistic (we’ll improve it soon).

3.3 Analyzing \(m_1\)

##  Family: MV(gaussian, gaussian) 
##   Links: mu = log; sigma = identity
##          mu = log; sigma = identity 
## Formula: einspectS ~ 0 + family + category 
##          timeS ~ 0 + family + category 
##    Data: by.statement (Number of observations: 945) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##                       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## einspectS_familyMBFL     -2.93      0.50    -4.03    -2.06 1.00     4578
## einspectS_familyPS        0.43      0.08     0.27     0.58 1.00     4733
## einspectS_familyST        0.46      0.08     0.29     0.61 1.00     4045
## einspectS_familySBFL     -3.44      0.48    -4.45    -2.57 1.00     5027
## einspectS_categoryDEV    -2.63      0.48    -3.64    -1.81 1.00     4929
## einspectS_categoryDS     -0.67      0.14    -0.95    -0.42 1.00     3836
## einspectS_categoryWEB    -2.38      0.48    -3.43    -1.57 1.00     4807
## timeS_familyMBFL         -1.39      0.20    -1.82    -1.01 1.00     1415
## timeS_familyPS           -2.09      0.28    -2.70    -1.59 1.00     2049
## timeS_familyST           -3.75      0.45    -4.67    -2.96 1.00     3648
## timeS_familySBFL         -4.38      0.45    -5.31    -3.56 1.00     3381
## timeS_categoryDEV         0.96      0.27     0.43     1.47 1.00     1748
## timeS_categoryDS          1.49      0.21     1.08     1.91 1.00     1399
## timeS_categoryWEB        -0.95      0.66    -2.41     0.15 1.00     3819
##                       Tail_ESS
## einspectS_familyMBFL      2861
## einspectS_familyPS        2968
## einspectS_familyST        3075
## einspectS_familySBFL      2787
## einspectS_categoryDEV     2893
## einspectS_categoryDS      3079
## einspectS_categoryWEB     2670
## timeS_familyMBFL          1864
## timeS_familyPS            2727
## timeS_familyST            2896
## timeS_familySBFL          2580
## timeS_categoryDEV         2317
## timeS_categoryDS          2123
## timeS_categoryWEB         2669
## 
## Family Specific Parameters: 
##                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma_einspectS     0.86      0.02     0.82     0.90 1.00     5808     3431
## sigma_timeS         0.93      0.02     0.88     0.97 1.00     4492     3001
## 
## Residual Correlations: 
##                         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## rescor(einspectS,timeS)     0.07      0.03     0.00     0.14 1.00     4829
##                         Tail_ESS
## rescor(einspectS,timeS)     2993
## 
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

What’s noticeable here is the residual correlation between the two outcomes einspect and time is quite small (7%), which means that there is not much of a consistent dependency between these two variables.

Let’s set up some functions to analyze the posterior samples of m1 (and similar models).

Let’s use these functions to first analyze the effects per family of FL techniques.

## $ints
##             MBFL       PS       SBFL       ST
## |0.5  -1.8403733 1.750277 -2.2904133 1.787965
## |0.7  -1.9987933 1.723377 -2.4933733 1.755788
## |0.9  -2.3258633 1.679919 -2.8417033 1.699362
## |0.95 -2.5844133 1.656920 -2.9724333 1.667282
## |0.99 -2.9603633 1.589496 -3.3652433 1.613864
## 0.99| -0.4593433 2.011449 -0.9055133 2.031262
## 0.95| -0.6456633 1.958796 -1.1198133 1.988802
## 0.9|  -0.6952333 1.932004 -1.3104833 1.966496
## 0.7|  -0.9746733 1.881454 -1.5097333 1.920313
## 0.5|  -1.1869033 1.854129 -1.6453033 1.896548
## 
## $est
## NULL
## $ints
##            MBFL         PS       SBFL          ST
## |0.5  1.3932608 0.65172082 -1.7136392 -1.10199918
## |0.7  1.3372008 0.54789082 -1.8752192 -1.23743918
## |0.9  1.1835608 0.36650082 -2.2369192 -1.56690918
## |0.95 1.1137808 0.26052082 -2.3766392 -1.74514918
## |0.99 0.9678708 0.01876082 -2.6434592 -2.13414918
## 0.99| 2.0093678 1.45849082 -0.3948592  0.15032082
## 0.95| 1.9086418 1.33810082 -0.6333692 -0.03348918
## 0.9|  1.8495308 1.26726082 -0.7575192 -0.13368918
## 0.7|  1.7494308 1.10962082 -0.9557192 -0.30827918
## 0.5|  1.6572508 1.01740082 -1.1247692 -0.49940918
## 
## $est
## NULL

It’s clear that for both outcomes, e_inspect and time, there are clear differences (with high probability) in the contribution over the mean from different families of techniques.

Looking at the effects by category of project does not yeld as strong differences, but we can see that DS projects tend to be associated with worse (higher) e_inspect and longer running times, and that WEB projects are associated with shorter running times.

## $ints
##              DEV        DS        WEB
## |0.5  -1.0046079 1.1276011 -0.7240479
## |0.7  -1.1353879 1.1107671 -0.9001379
## |0.9  -1.4761379 0.9893531 -1.2261379
## |0.95 -1.6771979 0.9489131 -1.4957179
## |0.99 -2.0851179 0.8482821 -1.8106379
## 0.99|  0.3128821 1.5513121  0.5550221
## 0.95|  0.1259921 1.4800181  0.3450021
## 0.9|   0.0804521 1.4342721  0.3041321
## 0.7|  -0.1538179 1.3848411  0.0720821
## 0.5|  -0.3640279 1.3044731 -0.0804679
## 
## $est
## NULL
## $ints
##               DEV       DS       WEB
## |0.5   0.27467305 0.845127 -1.681273
## |0.7   0.19445705 0.768187 -2.001653
## |0.9   0.03318005 0.625197 -2.491093
## |0.95 -0.04185295 0.585337 -2.839633
## |0.99 -0.24780795 0.456786 -3.446723
## 0.99|  1.16100705 1.535847 -0.122691
## 0.95|  0.99543705 1.413317 -0.307852
## 0.9|   0.88579705 1.310127 -0.358522
## 0.7|   0.73274705 1.196797 -0.681840
## 0.5|   0.62126705 1.125677 -0.834484
## 
## $est
## NULL

3.4 Model \(m_2\): multivariate varying effects

Let’s make the model more sophisticated, with varying effects, and modeling these effects as possibly correlated (which makes sense, since we have two model parts)

eq.m2 <- brmsformula(
  mvbind(einspectS, timeS) ~ 1 + (1|p|family) + (1|q|category),
  family=brmsfamily("gaussian", link="log")
) + set_rescor(TRUE)

pp2.check <- get_prior(eq.m2, data=by.statement)

pp2 <- c(
  set_prior("normal(0, 1.0)", class="Intercept", resp=c("einspectS", "timeS")),
  set_prior("weibull(2, 0.3)", class="sd", coef="Intercept", 
            group="family", resp=c("einspectS", "timeS")),
  set_prior("weibull(2, 0.3)", class="sd", coef="Intercept", 
            group="category", resp=c("einspectS", "timeS")),
  set_prior("gamma(0.01, 0.01)", class="sigma", resp=c("einspectS", "timeS"))
)

3.5 Fitting \(m_2\)

Let’s fit \(m_2\) and check the fit.

Prior checks:

We fit model \(m_2\).

## Start sampling
## Running MCMC with 4 chains, at most 8 in parallel...
## 
## Chain 2 finished in 76.4 seconds.
## Chain 4 finished in 77.2 seconds.
## Chain 1 finished in 78.0 seconds.
## Chain 3 finished in 82.7 seconds.
## 
## All 4 chains finished successfully.
## Mean chain execution time: 78.6 seconds.
## Total execution time: 82.8 seconds.

Diagnostics:

## [1] 0
## [1] 1.004241
## [1] 0.3730994

Posterior checks:

We don’t notice a clear improvement compared to \(m_1\). Let’s compare the two models using LOO.

## Output of model 'm1':
## 
## Computed from 4000 by 945 log-likelihood matrix
## 
##          Estimate    SE
## elpd_loo  -2485.8  97.2
## p_loo        46.2   6.8
## looic      4971.7 194.3
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     942   99.7%   749       
##  (0.5, 0.7]   (ok)         2    0.2%   415       
##    (0.7, 1]   (bad)        1    0.1%   104       
##    (1, Inf)   (very bad)   0    0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## Output of model 'm2':
## 
## Computed from 4000 by 945 log-likelihood matrix
## 
##          Estimate    SE
## elpd_loo  -2479.7  98.4
## p_loo        47.6   7.1
## looic      4959.4 196.9
## ------
## Monte Carlo SE of elpd_loo is 0.2.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     944   99.9%   346       
##  (0.5, 0.7]   (ok)         1    0.1%   163       
##    (0.7, 1]   (bad)        0    0.0%   <NA>      
##    (1, Inf)   (very bad)   0    0.0%   <NA>      
## 
## All Pareto k estimates are ok (k < 0.7).
## See help('pareto-k-diagnostic') for details.
## 
## Model comparisons:
##    elpd_diff se_diff
## m2  0.0       0.0   
## m1 -6.2       2.1

\(m_1\)’s score is more than 2.9 standard deviations worse than \(m_2\)’s, which is a pretty significant difference in favor of \(m_2\) in terms of predictive capabilities.

3.6 Analyzing \(m_2\)

##  Family: MV(gaussian, gaussian) 
##   Links: mu = log; sigma = identity
##          mu = log; sigma = identity 
## Formula: einspectS ~ 1 + (1 | p | family) + (1 | q | category) 
##          timeS ~ 1 + (1 | p | family) + (1 | q | category) 
##    Data: by.statement (Number of observations: 945) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Group-Level Effects: 
## ~category (Number of levels: 4) 
##                                          Estimate Est.Error l-95% CI u-95% CI
## sd(einspectS_Intercept)                      0.67      0.12     0.46     0.93
## sd(timeS_Intercept)                          0.65      0.12     0.43     0.91
## cor(einspectS_Intercept,timeS_Intercept)    -0.11      0.24    -0.54     0.36
##                                          Rhat Bulk_ESS Tail_ESS
## sd(einspectS_Intercept)                  1.00     3839     3353
## sd(timeS_Intercept)                      1.00     3874     2820
## cor(einspectS_Intercept,timeS_Intercept) 1.00     2905     2639
## 
## ~family (Number of levels: 4) 
##                                          Estimate Est.Error l-95% CI u-95% CI
## sd(einspectS_Intercept)                      0.90      0.12     0.69     1.14
## sd(timeS_Intercept)                          0.76      0.12     0.54     1.02
## cor(einspectS_Intercept,timeS_Intercept)     0.03      0.20    -0.37     0.40
##                                          Rhat Bulk_ESS Tail_ESS
## sd(einspectS_Intercept)                  1.00     3466     2726
## sd(timeS_Intercept)                      1.00     3558     2512
## cor(einspectS_Intercept,timeS_Intercept) 1.00     2932     2497
## 
## Population-Level Effects: 
##                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## einspectS_Intercept    -2.11      0.52    -3.15    -1.11 1.00     1983     2230
## timeS_Intercept        -2.24      0.49    -3.20    -1.27 1.00     2504     2741
## 
## Family Specific Parameters: 
##                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma_einspectS     0.86      0.02     0.82     0.90 1.00     5117     3372
## sigma_timeS         0.92      0.02     0.88     0.96 1.00     4825     2688
## 
## Residual Correlations: 
##                         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## rescor(einspectS,timeS)     0.06      0.03    -0.00     0.13 1.00     4792
##                         Tail_ESS
## rescor(einspectS,timeS)     2982
## 
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

By category, there is a slight inverse correlation between the two outcomes einspect and time; this correlation disappears if we look at the family terms. The residual correlation is even a bit lower than in m_1.

Let’s now perform an effects analysis on the fitted coefficients of m2. First we introduce a summary function suitable for varying effects models.

Then we use the summary function to analyze the effects of the FL techniques.

## $ints
##             MBFL        PS        ST      SBFL
## |0.5  -2.4606500 1.1081400 1.1413275 -2.949263
## |0.7  -2.7027925 0.9450821 0.9713919 -3.200388
## |0.9  -3.1127090 0.6570809 0.6771380 -3.654665
## |0.95 -3.3454335 0.5066452 0.5340358 -3.930798
## |0.99 -3.8695932 0.2231197 0.2725894 -4.369491
## 0.99| -0.6635308 2.5672021 2.6205916 -1.047415
## 0.95| -0.9485867 2.2916865 2.3202460 -1.387823
## 0.9|  -1.0984565 2.1301105 2.1644880 -1.559788
## 0.7|  -1.4229790 1.8533755 1.8831670 -1.912237
## 0.5|  -1.6262350 1.6947850 1.7288225 -2.118677
## 
## $est
##      MBFL        PS        ST      SBFL 
## -2.060815  1.400548  1.433388 -2.548815
## $ints
##            MBFL          PS          ST       SBFL
## |0.5  1.0946800  0.22141375 -1.70961000 -2.2593425
## |0.7  0.9465640  0.06355268 -1.94318600 -2.4973660
## |0.9  0.6893988 -0.23781360 -2.42553900 -2.9487880
## |0.95 0.5458556 -0.40320322 -2.64475950 -3.2187678
## |0.99 0.2741525 -0.74160358 -3.19923510 -3.7255032
## 0.99| 2.5363547  1.75528050 -0.09891588 -0.5405140
## 0.95| 2.2093073  1.39546375 -0.34315037 -0.7979575
## 0.9|  2.0756000  1.24684350 -0.47696785 -0.9458480
## 0.7|  1.7981860  0.98256800 -0.76741045 -1.2691605
## 0.5|  1.6436150  0.82162225 -0.95718375 -1.4560650
## 
## $est
##       MBFL         PS         ST       SBFL 
##  1.3723869  0.5184232 -1.3600563 -1.8793254

The results are generally consistent with those of model \(m_1\), although some effects slightly weaken or strengthen.

Let’s see what happens for the bug/category of projects.

## $ints
##              CL        DEV          DS        WEB
## |0.5  0.8973230 -1.6409625  0.19578975 -1.4068925
## |0.7  0.7781830 -1.8274915  0.06527855 -1.6149360
## |0.9  0.5577576 -2.2052000 -0.16398785 -1.9760895
## |0.95 0.4281495 -2.3890668 -0.27913252 -2.1481097
## |0.99 0.2105576 -2.7207191 -0.50242412 -2.5928487
## 0.99| 2.0780029 -0.2655134  1.41146000  0.1100508
## 0.95| 1.8353168 -0.4866055  1.14676950 -0.1986511
## 0.9|  1.7212210 -0.5962981  1.02549050 -0.3344486
## 0.7|  1.4851615 -0.8451007  0.80029395 -0.5932893
## 0.5|  1.3611725 -1.0031125  0.66661650 -0.7507050
## 
## $est
##         CL        DEV         DS        WEB 
##  1.1290573 -1.3369545  0.4319761 -1.0976749
## $ints
##               CL         DEV         DS         WEB
## |0.5  -1.7739200  0.20136575 0.74401400 -1.13223000
## |0.7  -1.9769675  0.06991984 0.61347920 -1.34135950
## |0.9  -2.3588815 -0.17846145 0.39368695 -1.70685700
## |0.95 -2.5783037 -0.29627345 0.27697138 -1.89915975
## |0.99 -2.9636419 -0.54494419 0.03735386 -2.40549125
## 0.99| -0.3200830  1.42128610 1.91708435  0.30014064
## 0.95| -0.6354105  1.17446125 1.69174025  0.09446047
## 0.9|  -0.7477540  1.04970900 1.56658950 -0.06868987
## 0.7|  -0.9911982  0.82799765 1.33784450 -0.32573750
## 0.5|  -1.1411575  0.69177800 1.20483000 -0.47806200
## 
## $est
##         CL        DEV         DS        WEB 
## -1.4791651  0.4460166  0.9762418 -0.8246671

Here we see some differences, which may partly be due to the fact that \(m_2\) models the different categories more uniformly.

3.7 Model \(m_3\): interactions

Now, let’s try a variant of \(m_2\) where we go back to fixed intercepts but add an interaction term between family of FL techniques and category of projects.

eq.m3 <- brmsformula(
  mvbind(einspectS, timeS) ~ 
    0 + family + category + (0 + family|r|category),
  family=brmsfamily("gaussian", link="log")
) + set_rescor(TRUE)

pp3.check <- get_prior(eq.m3, data=by.statement)

pp3 <- c(
  set_prior("normal(0, 1.0)", class="b", resp=c("einspectS", "timeS")),
  set_prior("gamma(0.01, 0.01)", class="sigma", resp=c("einspectS", "timeS")),
  set_prior("lkj(1)", class="cor"),
  set_prior("weibull(2, 0.3)", class="sd", resp=c("einspectS", "timeS"))
)

3.8 Fitting \(m_3\)

Let’s fit \(m_3\) and check the fit.

Prior checks:

We fit model \(m_3\).

## Start sampling
## Running MCMC with 4 chains, at most 8 in parallel...
## 
## Chain 4 finished in 141.8 seconds.
## Chain 2 finished in 142.7 seconds.
## Chain 3 finished in 144.9 seconds.
## Chain 1 finished in 152.1 seconds.
## 
## All 4 chains finished successfully.
## Mean chain execution time: 145.4 seconds.
## Total execution time: 152.3 seconds.

Diagnostics:

## [1] 0
## [1] 1.003471
## [1] 0.2959395

Posterior checks:

In line with what seen before, possibly a bit better.

Model comparison:

## Output of model 'm1':
## 
## Computed from 4000 by 945 log-likelihood matrix
## 
##          Estimate    SE
## elpd_loo  -2485.8  97.2
## p_loo        46.2   6.8
## looic      4971.7 194.3
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     942   99.7%   749       
##  (0.5, 0.7]   (ok)         2    0.2%   415       
##    (0.7, 1]   (bad)        1    0.1%   104       
##    (1, Inf)   (very bad)   0    0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## Output of model 'm2':
## 
## Computed from 4000 by 945 log-likelihood matrix
## 
##          Estimate    SE
## elpd_loo  -2479.7  98.4
## p_loo        47.6   7.1
## looic      4959.4 196.9
## ------
## Monte Carlo SE of elpd_loo is 0.2.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     944   99.9%   346       
##  (0.5, 0.7]   (ok)         1    0.1%   163       
##    (0.7, 1]   (bad)        0    0.0%   <NA>      
##    (1, Inf)   (very bad)   0    0.0%   <NA>      
## 
## All Pareto k estimates are ok (k < 0.7).
## See help('pareto-k-diagnostic') for details.
## 
## Output of model 'm3':
## 
## Computed from 4000 by 945 log-likelihood matrix
## 
##          Estimate    SE
## elpd_loo  -2476.5  97.4
## p_loo        53.5   8.1
## looic      4953.0 194.9
## ------
## Monte Carlo SE of elpd_loo is 0.2.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     942   99.7%   628       
##  (0.5, 0.7]   (ok)         3    0.3%   127       
##    (0.7, 1]   (bad)        0    0.0%   <NA>      
##    (1, Inf)   (very bad)   0    0.0%   <NA>      
## 
## All Pareto k estimates are ok (k < 0.7).
## See help('pareto-k-diagnostic') for details.
## 
## Model comparisons:
##    elpd_diff se_diff
## m3  0.0       0.0   
## m2 -3.2       6.5   
## m1 -9.3       6.3

\(m_2\)’s score is less than half standard deviation worse than \(m_3\)’s. This is a negligible improvement, not worth the additional complexity of model \(m_3\). Thus, we stick with \(m_2\) as our selected model.

3.9 Analyzing \(m_3\)

##  Family: MV(gaussian, gaussian) 
##   Links: mu = log; sigma = identity
##          mu = log; sigma = identity 
## Formula: einspectS ~ 0 + family + category + (0 + family | r | category) 
##          timeS ~ 0 + family + category + (0 + family | r | category) 
##    Data: by.statement (Number of observations: 945) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Group-Level Effects: 
## ~category (Number of levels: 4) 
##                                                Estimate Est.Error l-95% CI
## sd(einspectS_familyMBFL)                           0.28      0.15     0.05
## sd(einspectS_familyPS)                             0.27      0.14     0.05
## sd(einspectS_familyST)                             0.33      0.14     0.09
## sd(einspectS_familySBFL)                           0.29      0.15     0.05
## sd(timeS_familyMBFL)                               0.52      0.14     0.26
## sd(timeS_familyPS)                                 0.39      0.16     0.10
## sd(timeS_familyST)                                 0.27      0.14     0.05
## sd(timeS_familySBFL)                               0.28      0.15     0.05
## cor(einspectS_familyMBFL,einspectS_familyPS)      -0.01      0.33    -0.61
## cor(einspectS_familyMBFL,einspectS_familyST)      -0.05      0.33    -0.68
## cor(einspectS_familyPS,einspectS_familyST)         0.01      0.32    -0.60
## cor(einspectS_familyMBFL,einspectS_familySBFL)     0.05      0.33    -0.60
## cor(einspectS_familyPS,einspectS_familySBFL)      -0.01      0.34    -0.64
## cor(einspectS_familyST,einspectS_familySBFL)      -0.05      0.34    -0.68
## cor(einspectS_familyMBFL,timeS_familyMBFL)         0.06      0.33    -0.59
## cor(einspectS_familyPS,timeS_familyMBFL)           0.10      0.30    -0.51
## cor(einspectS_familyST,timeS_familyMBFL)          -0.25      0.28    -0.74
## cor(einspectS_familySBFL,timeS_familyMBFL)         0.07      0.32    -0.57
## cor(einspectS_familyMBFL,timeS_familyPS)           0.07      0.33    -0.59
## cor(einspectS_familyPS,timeS_familyPS)             0.00      0.32    -0.60
## cor(einspectS_familyST,timeS_familyPS)            -0.18      0.31    -0.73
## cor(einspectS_familySBFL,timeS_familyPS)           0.08      0.33    -0.57
## cor(timeS_familyMBFL,timeS_familyPS)               0.27      0.29    -0.36
## cor(einspectS_familyMBFL,timeS_familyST)           0.03      0.33    -0.62
## cor(einspectS_familyPS,timeS_familyST)            -0.01      0.33    -0.63
## cor(einspectS_familyST,timeS_familyST)            -0.01      0.33    -0.64
## cor(einspectS_familySBFL,timeS_familyST)           0.03      0.33    -0.61
## cor(timeS_familyMBFL,timeS_familyST)               0.00      0.32    -0.61
## cor(timeS_familyPS,timeS_familyST)                 0.02      0.34    -0.63
## cor(einspectS_familyMBFL,timeS_familySBFL)         0.04      0.33    -0.60
## cor(einspectS_familyPS,timeS_familySBFL)          -0.02      0.33    -0.63
## cor(einspectS_familyST,timeS_familySBFL)          -0.02      0.34    -0.65
## cor(einspectS_familySBFL,timeS_familySBFL)         0.03      0.33    -0.61
## cor(timeS_familyMBFL,timeS_familySBFL)             0.02      0.33    -0.63
## cor(timeS_familyPS,timeS_familySBFL)               0.04      0.33    -0.60
## cor(timeS_familyST,timeS_familySBFL)               0.04      0.34    -0.61
##                                                u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(einspectS_familyMBFL)                           0.61 1.00     3603     1884
## sd(einspectS_familyPS)                             0.58 1.00     2415     2138
## sd(einspectS_familyST)                             0.62 1.00     2216     1934
## sd(einspectS_familySBFL)                           0.63 1.00     3985     2253
## sd(timeS_familyMBFL)                               0.81 1.00     3409     2193
## sd(timeS_familyPS)                                 0.71 1.00     2189     1669
## sd(timeS_familyST)                                 0.60 1.00     4092     2314
## sd(timeS_familySBFL)                               0.61 1.00     3464     1805
## cor(einspectS_familyMBFL,einspectS_familyPS)       0.63 1.00     4017     2761
## cor(einspectS_familyMBFL,einspectS_familyST)       0.59 1.00     3976     2796
## cor(einspectS_familyPS,einspectS_familyST)         0.59 1.00     3409     2945
## cor(einspectS_familyMBFL,einspectS_familySBFL)     0.65 1.00     5144     2572
## cor(einspectS_familyPS,einspectS_familySBFL)       0.64 1.00     4443     3294
## cor(einspectS_familyST,einspectS_familySBFL)       0.63 1.00     4031     2913
## cor(einspectS_familyMBFL,timeS_familyMBFL)         0.66 1.00     2433     2691
## cor(einspectS_familyPS,timeS_familyMBFL)           0.66 1.00     2464     3028
## cor(einspectS_familyST,timeS_familyMBFL)           0.34 1.00     2956     2790
## cor(einspectS_familySBFL,timeS_familyMBFL)         0.67 1.00     2731     3241
## cor(einspectS_familyMBFL,timeS_familyPS)           0.68 1.00     3831     2839
## cor(einspectS_familyPS,timeS_familyPS)             0.62 1.00     3043     3004
## cor(einspectS_familyST,timeS_familyPS)             0.45 1.00     3072     2988
## cor(einspectS_familySBFL,timeS_familyPS)           0.68 1.00     3426     3180
## cor(timeS_familyMBFL,timeS_familyPS)               0.76 1.00     2752     3005
## cor(einspectS_familyMBFL,timeS_familyST)           0.65 1.00     5628     2490
## cor(einspectS_familyPS,timeS_familyST)             0.62 1.00     4997     2858
## cor(einspectS_familyST,timeS_familyST)             0.61 1.00     3987     3302
## cor(einspectS_familySBFL,timeS_familyST)           0.63 1.00     3228     2694
## cor(timeS_familyMBFL,timeS_familyST)               0.62 1.00     3965     3682
## cor(timeS_familyPS,timeS_familyST)                 0.65 1.00     3157     3360
## cor(einspectS_familyMBFL,timeS_familySBFL)         0.65 1.00     5670     3219
## cor(einspectS_familyPS,timeS_familySBFL)           0.63 1.00     4610     2988
## cor(einspectS_familyST,timeS_familySBFL)           0.62 1.00     4241     3430
## cor(einspectS_familySBFL,timeS_familySBFL)         0.66 1.00     3017     2579
## cor(timeS_familyMBFL,timeS_familySBFL)             0.63 1.00     4248     3320
## cor(timeS_familyPS,timeS_familySBFL)               0.66 1.00     3117     3237
## cor(timeS_familyST,timeS_familySBFL)               0.66 1.00     2204     2885
## 
## Population-Level Effects: 
##                       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## einspectS_familyMBFL     -2.92      0.55    -4.08    -1.95 1.00     3862
## einspectS_familyPS        0.36      0.28    -0.23     0.88 1.00     1642
## einspectS_familyST        0.07      0.34    -0.68     0.63 1.00     1433
## einspectS_familySBFL     -3.37      0.53    -4.47    -2.42 1.00     3997
## einspectS_categoryDEV    -2.43      0.54    -3.57    -1.41 1.00     3525
## einspectS_categoryDS     -0.55      0.35    -1.28     0.11 1.00     1184
## einspectS_categoryWEB    -2.23      0.56    -3.40    -1.22 1.00     3344
## timeS_familyMBFL         -0.97      0.42    -1.79    -0.15 1.00     1906
## timeS_familyPS           -1.39      0.42    -2.22    -0.54 1.00     1544
## timeS_familyST           -3.17      0.54    -4.29    -2.14 1.00     3349
## timeS_familySBFL         -3.79      0.52    -4.84    -2.80 1.00     3302
## timeS_categoryDEV         0.34      0.50    -0.69     1.26 1.00     1652
## timeS_categoryDS          0.42      0.49    -0.60     1.32 1.00     1748
## timeS_categoryWEB        -1.30      0.66    -2.70    -0.14 1.00     3724
##                       Tail_ESS
## einspectS_familyMBFL      2896
## einspectS_familyPS        2079
## einspectS_familyST        2578
## einspectS_familySBFL      2723
## einspectS_categoryDEV     3166
## einspectS_categoryDS      2246
## einspectS_categoryWEB     2102
## timeS_familyMBFL          2045
## timeS_familyPS            2719
## timeS_familyST            2482
## timeS_familySBFL          3036
## timeS_categoryDEV         2640
## timeS_categoryDS          2138
## timeS_categoryWEB         2595
## 
## Family Specific Parameters: 
##                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma_einspectS     0.85      0.02     0.81     0.89 1.00     7802     2940
## sigma_timeS         0.92      0.02     0.88     0.97 1.00     5768     3152
## 
## Residual Correlations: 
##                         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## rescor(einspectS,timeS)     0.06      0.03    -0.00     0.13 1.00     6486
##                         Tail_ESS
## rescor(einspectS,timeS)     2824
## 
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Instead of considering the fixed and varying effects of \(m_3\), we can estimate the marginal means for each family of FL techniques (results omitted for brevity, since we’ll focus on \(m_2\) anyway).

3.10 Model \(m_4\): bug-kind-specific interaction effects

Let’s now add predictors to \(m_2\), so as to study any effect of the kinds of bugs:

  • predicate is a Boolean value that identifies predicate-related bugs
  • crashing is a Boolean value that identifies crashing bugs
  • mutability is a nonnegative score that denotes the percentage of mutants that mutate a line in a bug’s ground truth
  • mutable is a Boolean that identifies the bugs with a positive mutability score

Since mutability/mutable are likely affecting category and einspect, it makes sense to add the predictor, so as to close the possible backdoor path \(\textrm{category} \leftarrow \textrm{mutable} \rightarrow \textrm{einspect}\).

We are only interested in controlling for bug kind for einspect, thus switch to an univariate model where einspect is the only outcome variable.

eq.m4.einspect <- brmsformula(einspectS ~ 1 
                              + (1|p|family) + (1|q|category) 
                              + predicate*family 
                              + crashing*family 
                              + ismutable*family,
                              family=brmsfamily("gaussian", link="log"))

eq.m4 <- eq.m4.einspect

pp4.check <- get_prior(eq.m4, data=by.statement)

pp4 <- c(
  set_prior("normal(0, 1.0)", class="Intercept"),
  set_prior("normal(0, 1.0)", class="b"),
  set_prior("weibull(2, 0.3)", class="sd", coef="Intercept", 
            group="family"),
  set_prior("weibull(2, 0.3)", class="sd", coef="Intercept", 
            group="category"),
  set_prior("gamma(0.01, 0.01)", class="sigma")
)

3.11 Fitting \(m_4\)

Prior checks:

We fit model \(m_4\).

## Start sampling
## Running MCMC with 4 chains, at most 8 in parallel...
## 
## Chain 4 finished in 29.8 seconds.
## Chain 2 finished in 30.9 seconds.
## Chain 3 finished in 32.0 seconds.
## Chain 1 finished in 38.2 seconds.
## 
## All 4 chains finished successfully.
## Mean chain execution time: 32.7 seconds.
## Total execution time: 38.3 seconds.

Diagnostics:

## [1] 0
## [1] 1.003327
## [1] 0.3383162

Posterior checks:

Since \(m_4\) uses less data than the previous models (it doesn’t consider outcome time), we cannot it compare it to the other models using LOO (or any information criterion, for that matter).

3.12 Analyzing \(m_4\)

##  Family: gaussian 
##   Links: mu = log; sigma = identity 
## Formula: einspectS ~ 1 + (1 | p | family) + (1 | q | category) + predicate * family + crashing * family + ismutable * family 
##    Data: by.statement (Number of observations: 945) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Group-Level Effects: 
## ~category (Number of levels: 4) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.70      0.12     0.48     0.95 1.00     4044     3090
## 
## ~family (Number of levels: 4) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.46      0.20     0.10     0.85 1.00     1799     1862
## 
## Population-Level Effects: 
##                          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept                   -2.67      0.65    -3.88    -1.36 1.00     2428
## predicateTRUE               -0.46      0.50    -1.45     0.50 1.00     2012
## familyPS                     1.62      0.68     0.22     2.87 1.00     2388
## familyST                     1.89      0.69     0.38     3.12 1.00     2314
## familySBFL                  -1.20      0.70    -2.57     0.17 1.00     4081
## crashingTRUE                -1.15      0.52    -2.18    -0.15 1.00     2799
## ismutableTRUE               -0.25      0.48    -1.16     0.69 1.00     1965
## predicateTRUE:familyPS      -0.07      0.52    -1.06     0.94 1.00     2010
## predicateTRUE:familyST       0.48      0.50    -0.49     1.48 1.00     1980
## predicateTRUE:familySBFL    -0.02      0.86    -1.73     1.66 1.00     5036
## familyPS:crashingTRUE        1.01      0.54    -0.02     2.06 1.00     2822
## familyST:crashingTRUE       -1.82      0.65    -3.12    -0.57 1.00     3334
## familySBFL:crashingTRUE      0.03      0.87    -1.67     1.67 1.00     4750
## familyPS:ismutableTRUE       1.01      0.49     0.02     1.96 1.00     2148
## familyST:ismutableTRUE       0.87      0.49    -0.12     1.81 1.00     1914
## familySBFL:ismutableTRUE    -0.29      0.81    -1.84     1.31 1.00     4679
##                          Tail_ESS
## Intercept                    2629
## predicateTRUE                2368
## familyPS                     2632
## familyST                     2882
## familySBFL                   2796
## crashingTRUE                 2913
## ismutableTRUE                2223
## predicateTRUE:familyPS       2540
## predicateTRUE:familyST       2460
## predicateTRUE:familySBFL     2962
## familyPS:crashingTRUE        2868
## familyST:crashingTRUE        2923
## familySBFL:crashingTRUE      2746
## familyPS:ismutableTRUE       2617
## familyST:ismutableTRUE       2329
## familySBFL:ismutableTRUE     2819
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.78      0.02     0.75     0.82 1.00     6433     2648
## 
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Let’s now perform an effects analysis on the fitted coefficients of m4.

Specifically, we look at the (fixed) effects of the families associated with certain categories of bugs, for response einspect.

## $ints
##       crashing MBFL crashing PS crashing ST crashing SBFL
## |0.5     -1.4957225  0.64172875  -2.2535900    -0.5706570
## |0.7     -1.6966420  0.45311365  -2.5091630    -0.8849482
## |0.9     -2.0240975  0.14002575  -2.8906505    -1.4130940
## |0.95    -2.1821472 -0.01729245  -3.1170247    -1.6737832
## |0.99    -2.5282891 -0.35063686  -3.5874488    -2.1767845
## 0.99|     0.1074132  2.44609395  -0.2224062     2.1764674
## 0.95|    -0.1510499  2.05582775  -0.5708796     1.6714243
## 0.9|     -0.3032274  1.89343600  -0.7896973     1.4436930
## 0.7|     -0.6125321  1.57059350  -1.1579595     0.9424029
## 0.5|     -0.7958315  1.37090000  -1.3684550     0.6420823
## 
## $est
## crashing MBFL   crashing PS   crashing ST crashing SBFL 
##   -1.15095905    1.00587635   -1.82331692    0.02942783
## $ints
##       predicate MBFL predicate PS predicate ST predicate SBFL
## |0.5     -0.80539225   -0.4252112   0.14267150     -0.5879013
## |0.7     -0.97924005   -0.5941629  -0.04137106     -0.8908687
## |0.9     -1.27643350   -0.8829843  -0.30878825     -1.4396120
## |0.95    -1.44913900   -1.0634027  -0.49199142     -1.7304670
## |0.99    -1.85374910   -1.3837009  -0.76560686     -2.2351072
## 0.99|     0.81469355    1.3561932   1.89115115      2.0862540
## 0.95|     0.50288860    0.9428116   1.47530875      1.6589353
## 0.9|      0.33490690    0.7920933   1.31585700      1.3988270
## 0.7|      0.05072069    0.4752352   1.00939850      0.8711154
## 0.5|     -0.12628675    0.2727390   0.81871150      0.5510417
## 
## $est
## predicate MBFL   predicate PS   predicate ST predicate SBFL 
##    -0.46324267    -0.07029723     0.48350202    -0.02474266
## $ints
##       ismutable MBFL ismutable PS ismutable ST ismutable SBFL
## |0.5     -0.56463550   0.67018100   0.54703600     -0.8342113
## |0.7     -0.74068020   0.50529050   0.35888690     -1.1446125
## |0.9     -1.02027550   0.20357500   0.05512335     -1.6622670
## |0.95    -1.16404750   0.01718417  -0.12405297     -1.8383202
## |0.99    -1.46524645  -0.30899653  -0.47210512     -2.3507601
## 0.99|     1.03952155   2.31925060   2.11852570      1.6988942
## 0.95|     0.69069932   1.95853000   1.81475750      1.3090160
## 0.9|      0.52912870   1.82134500   1.66444000      1.0380830
## 0.7|      0.24578715   1.51742650   1.36737800      0.5435591
## 0.5|      0.07147647   1.33092500   1.19420250      0.2685713
## 
## $est
## ismutable MBFL   ismutable PS   ismutable ST ismutable SBFL 
##     -0.2487351      1.0068292      0.8698194     -0.2899824

So, crashing bugs are indeed easier for ST. In contrast, predicate-related bugs do not seem to be simpler for PS.

For the mutability bugs, we don’t find any consistent association. Thus, let’s try to add to the model a finer-grained dependency on mutability rather than just the boolean indicator mutable.

3.13 Model \(m_5\): mutability slope (failed attempt)

A simple way would be to introduce an interaction mutability\(\times\)family.

eq.m5.einspect <- brmsformula(einspectS ~ 1 
                              + (1|p|family) + (1|q|category) 
                              + predicate*family 
                              + crashing*family 
                              + mutability*family,
                              family=brmsfamily("gaussian", link="log"))

eq.m5 <- eq.m5.einspect

pp5.check <- get_prior(eq.m5, data=by.statement)

pp5 <- c(
  set_prior("normal(0, 1.0)", class="Intercept"),
  set_prior("normal(0, 1.0)", class="b"),
  set_prior("weibull(2, 0.3)", class="sd", coef="Intercept", 
            group="family"),
  set_prior("weibull(2, 0.3)", class="sd", coef="Intercept", 
            group="category"),
  set_prior("gamma(0.01, 0.01)", class="sigma")
)

We could get passable (not great) prior checks, but let’s cut to the chase and fit model \(m_5\).

## Start sampling
## Running MCMC with 4 chains, at most 8 in parallel...
## 
## Chain 1 finished in 3.0 seconds.
## Chain 3 finished in 3.1 seconds.
## Chain 4 finished in 98.2 seconds.
## Chain 2 finished in 117.0 seconds.
## 
## All 4 chains finished successfully.
## Mean chain execution time: 55.3 seconds.
## Total execution time: 117.6 seconds.
## Warning: 438 of 4000 (11.0%) transitions hit the maximum treedepth limit of 10.
## See https://mc-stan.org/misc/warnings for details.
## Warning: 2 of 4 chains have a NaN E-BFMI.
## See https://mc-stan.org/misc/warnings for details.

The first thing that we notice is that two of the four chains terminated very quickly (suspiciously fast), whereas the other two went awry and spinned for much longer. In addition, we got a number of scary warnings. This points to some region of the posterior that could not be sampled effectively.

Let’s see the diagnostics:

## [1] 0
## [1] 5.472564
## [1] 0.001008065

A disaster. Let’s also plot the trace plots.

Two chains are straight lines, and hence did not mix at all with the others!

Notice that the distribution of mutability is very skewed, which explains the difficulties in fitting \(m_5\).

3.14 Models \(m_6\): mutability slope (successful attempt)

The most straightforward way out of this ditch is to simply log-transform mutability (after adding 1 to all percentages so that all logs are defined).

by.statement$logmutability <- log(1 + by.statement$mutability)

eq.m6.einspect <- brmsformula(einspectS ~ 1 
                              + (1|p|family) + (1|q|category) 
                              + predicate*family 
                              + crashing*family 
                              + logmutability*family,
                              family=brmsfamily("gaussian", link="log"))

eq.m6 <- eq.m6.einspect

pp6.check <- get_prior(eq.m6, data=by.statement)

pp6 <- c(
  set_prior("normal(0, 1.0)", class="Intercept"),
  set_prior("normal(0, 1.0)", class="b"),
  set_prior("weibull(2, 0.3)", class="sd", coef="Intercept", 
            group="family"),
  set_prior("weibull(2, 0.3)", class="sd", coef="Intercept", 
            group="category"),
  set_prior("gamma(0.01, 0.01)", class="sigma")
)

Alternative ways to modif \(m_5\) so that it can be analyzed (which we mention but don’t further explore here):

  • Introducing a multi-level term, with einspect ~ log(x)*family, and log(x) = log(y) + a, where \(x/y = \textrm{mutability}\). This is based on rewriting \(\log(a/b) = \alpha\) into \(\log(a) = \alpha + \log(b)\).

  • The approach followed in this paper.

3.15 Fitting \(m_6\)

Prior checks:

We fit model \(m_6\).

## Start sampling
## Running MCMC with 4 chains, at most 8 in parallel...
## 
## Chain 3 finished in 29.2 seconds.
## Chain 4 finished in 30.4 seconds.
## Chain 2 finished in 31.7 seconds.
## Chain 1 finished in 33.2 seconds.
## 
## All 4 chains finished successfully.
## Mean chain execution time: 31.1 seconds.
## Total execution time: 33.4 seconds.

Diagnostics:

## [1] 0
## [1] 1.003259
## [1] 0.4447736

Posterior checks:

Everything is A-OK now.

Let’s compare the models \(m_4\) and \(m_6\) using LOO.

## Output of model 'm4':
## 
## Computed from 4000 by 945 log-likelihood matrix
## 
##          Estimate    SE
## elpd_loo  -1143.7  57.6
## p_loo        62.6   9.7
## looic      2287.5 115.2
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     938   99.3%   286       
##  (0.5, 0.7]   (ok)         6    0.6%   155       
##    (0.7, 1]   (bad)        1    0.1%   45        
##    (1, Inf)   (very bad)   0    0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## Output of model 'm6':
## 
## Computed from 4000 by 945 log-likelihood matrix
## 
##          Estimate    SE
## elpd_loo  -1155.0  62.4
## p_loo        52.5   8.7
## looic      2310.0 124.8
## ------
## Monte Carlo SE of elpd_loo is 0.2.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     940   99.5%   404       
##  (0.5, 0.7]   (ok)         5    0.5%   118       
##    (0.7, 1]   (bad)        0    0.0%   <NA>      
##    (1, Inf)   (very bad)   0    0.0%   <NA>      
## 
## All Pareto k estimates are ok (k < 0.7).
## See help('pareto-k-diagnostic') for details.
## 
## Model comparisons:
##    elpd_diff se_diff
## m4   0.0       0.0  
## m6 -11.3      18.3

\(m_6\) and \(m_4\) are very close in terms of predictive capabilities.

3.16 Analyzing \(m_6\)

##  Family: gaussian 
##   Links: mu = log; sigma = identity 
## Formula: einspectS ~ 1 + (1 | p | family) + (1 | q | category) + predicate * family + crashing * family + logmutability * family 
##    Data: by.statement (Number of observations: 945) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Group-Level Effects: 
## ~category (Number of levels: 4) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.71      0.12     0.49     0.97 1.00     3682     3030
## 
## ~family (Number of levels: 4) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.48      0.18     0.14     0.84 1.00     2042     2078
## 
## Population-Level Effects: 
##                          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept                   -2.39      0.63    -3.63    -1.16 1.00     2389
## predicateTRUE               -0.26      0.49    -1.22     0.69 1.00     2375
## familyPS                     1.60      0.67     0.17     2.80 1.00     2430
## familyST                     1.90      0.66     0.48     3.12 1.00     2150
## familySBFL                  -1.28      0.71    -2.67     0.12 1.00     3825
## crashingTRUE                -1.19      0.53    -2.25    -0.19 1.00     2228
## logmutability               -0.53      0.35    -1.28     0.11 1.00     1944
## predicateTRUE:familyPS       0.06      0.50    -0.94     1.03 1.00     2371
## predicateTRUE:familyST       0.58      0.50    -0.38     1.55 1.00     2406
## predicateTRUE:familySBFL    -0.08      0.85    -1.78     1.54 1.00     4831
## familyPS:crashingTRUE        1.06      0.54     0.05     2.15 1.00     2270
## familyST:crashingTRUE       -1.79      0.68    -3.16    -0.49 1.00     3195
## familySBFL:crashingTRUE      0.01      0.84    -1.68     1.61 1.00     4947
## familyPS:logmutability       0.63      0.35    -0.00     1.36 1.00     1947
## familyST:logmutability       0.52      0.35    -0.13     1.26 1.00     1954
## familySBFL:logmutability    -0.11      0.63    -1.46     0.99 1.00     3366
##                          Tail_ESS
## Intercept                    2811
## predicateTRUE                2740
## familyPS                     2606
## familyST                     2775
## familySBFL                   2990
## crashingTRUE                 2284
## logmutability                2136
## predicateTRUE:familyPS       2542
## predicateTRUE:familyST       2562
## predicateTRUE:familySBFL     3103
## familyPS:crashingTRUE        2604
## familyST:crashingTRUE        2819
## familySBFL:crashingTRUE      3009
## familyPS:logmutability       2147
## familyST:logmutability       2130
## familySBFL:logmutability     2465
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.80      0.02     0.76     0.83 1.00     4795     3099
## 
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
## $ints
##       logmutability MBFL logmutability PS logmutability ST logmutability SBFL
## |0.5         -0.75474450      0.375940500      0.266247750         -0.5156230
## |0.7         -0.88416690      0.254462550      0.147502000         -0.7639471
## |0.8         -0.97204130      0.182238400      0.078897560         -0.9411668
## |0.87        -1.05968925      0.115193315      0.008621432         -1.1116126
## |0.9         -1.12050650      0.085546755     -0.027511595         -1.2257940
## |0.95        -1.28001225     -0.004312041     -0.130285675         -1.4553477
## |0.99        -1.47134080     -0.178970675     -0.261238665         -1.8345544
## 0.99|         0.25625661      1.564178850      1.469844800          1.3301171
## 0.95|         0.10875320      1.362056500      1.258613250          0.9929937
## 0.9|          0.01516966      1.221132000      1.107307500          0.8401534
## 0.87|        -0.02294809      1.163467750      1.050067700          0.7640344
## 0.8|         -0.09323496      1.068632000      0.955611000          0.6521943
## 0.7|         -0.16356845      0.982931800      0.874557150          0.5276751
## 0.5|         -0.27993625      0.855096250      0.749353000          0.3318090
## 
## $est
## logmutability MBFL   logmutability PS   logmutability ST logmutability SBFL 
##         -0.5291964          0.6255506          0.5170813         -0.1092753

There is a weak tendency for MBFL to do better on mutable bugs, but it can only be detected with 87% confidence (which is still decent). Incidentally, PS (and, to a lesser degree, ST) tends to perform worse on the same kinds of bugs, whereas SBFL is agnostic.

Finally, let’s also collect the varying intercepts estimates and intervals for the group-level terms for family and category. In \(m_6\) these now correspond to the effects on bugs that are in none of the special categories (crashing, predicate, mutable); since this is a relatively set, we don’t expect any very strong tendency (simply because the data is limited).

## $ints
##              MBFL          PS           ST        SBFL
## |0.5  -1.09837750 -0.03866348  0.008106222 -1.03776500
## |0.7  -1.31961950 -0.16916845 -0.123050900 -1.32181800
## |0.9  -1.79294900 -0.45615560 -0.354203300 -1.84372150
## |0.95 -2.05672725 -0.59820110 -0.500310250 -2.13864925
## |0.99 -2.59430055 -0.91827276 -0.792044045 -2.81509720
## 0.99|  0.31546326  1.81253300  1.835731450  0.50949495
## 0.95|  0.12512723  1.29965575  1.372258250  0.24451078
## 0.9|   0.03854548  1.09694800  1.147540000  0.13960190
## 0.7|  -0.16430215  0.72556065  0.774333900 -0.05992452
## 0.5|  -0.31820900  0.52408350  0.571569000 -0.19567100
## 
## $est
##       MBFL         PS         ST       SBFL 
## -0.7456581  0.2577865  0.3136117 -0.6669348
## $ints
##                CL        DEV         DS        WEB
## |0.5   0.60921450 -1.9213750  0.2900920 -1.8046375
## |0.7   0.47578020 -2.1417310  0.1576821 -2.0088575
## |0.9   0.21938015 -2.5400950 -0.1101333 -2.3996800
## |0.95  0.02653933 -2.7496218 -0.2652119 -2.6116985
## |0.99 -0.29597590 -3.1878640 -0.6054692 -2.9940812
## 0.99|  1.81451660 -0.4967978  1.4707381 -0.2732759
## 0.95|  1.56997425 -0.7112183  1.2597173 -0.5808576
## 0.9|   1.45210450 -0.8386855  1.1383780 -0.7016571
## 0.7|   1.23746050 -1.1045085  0.9314273 -0.9619804
## 0.5|   1.11337000 -1.2482250  0.7968602 -1.1282750
## 
## $est
##         CL        DEV         DS        WEB 
##  0.8508594 -1.6101808  0.5346215 -1.4848656

4 Summary plots

Let’s prepare and print some plots of the overall results for model \(m_6\).

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

5 Dump all data and plots

## Saving 7 x 5 in image
## [1] "paper/m2-family.pdf"
## Saving 7 x 5 in image
## [1] "paper/m2-category.pdf"
## Saving 7 x 5 in image
## [1] "paper/m6-crashing.pdf"
## Saving 7 x 5 in image
## [1] "paper/m6-predicate.pdf"
## Saving 7 x 5 in image
## [1] "paper/m6-mutable.pdf"